Goto

Collaborating Authors

 finite sum structure


Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Neural Information Processing Systems

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example.


Reviews: Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Neural Information Processing Systems

The paper proposes a method for optimization problems often found in machine learning tasks. The general loss function to minimize is of the form of a sum of smooth-convex functions associated with a convex regularization potential. The method is designed for the case of perturbation introduced in the data. Since the data sampling introduces a stochastic component Stochastic Gradient Descent (SGD) need of modifications for reducing the gradient variance [14,28]. In the case of perturbed data, such variance is magnified.


Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Bietti, Alberto, Mairal, Julien

Neural Information Processing Systems

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example.